Detrended fluctuation analysis of traffic data 
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Different routing strategies may result in different behaviors of traffic on internet. We analyze the correlation 
of traffic data for three typical routing strategies by the detrended fluctuation analysis (DFA) and find that the 
degree of correlation of the data can be divided into three regions, i.e., weak, medium, and strong correlation. 
The DFA scalings are constants in both the regions of weak and strong correlation but monotonously increase 
in the region of medium correlation. We suggest that it is better to consider the traffic on complex network as 
three phases, i.e., the free, buffer, and congestion phase, than just as two phases believed before, i.e., the free 
and congestion phase. 
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Undoubtedly, the internet has become a very important tool 
in our daily life. The operations on the internet, such as brows- 
ing World Wide Web (WWW) pages, sending messages by 
email, transferring files by ftp, searching for information on 
a range of topics, and shopping etc., have benefited us a lot. 
Therefore, sustaining its normal and efficient functioning is 
a basic requirement. However, the communication in the in- 
ternet does not always march/go freely. Similar to the traf- 
fic jam on the highway, the intermittent congestion in the in- 
ternet has been observed [1]. This phenomenon can be also 
observed in other communication networks, such as in the 
airline transportation network or in the postal service net- 
work. For reducing/controlling the traffic congestion in com- 
lex networks, a number of app roaches have been presented 
"ISSBSiSHOH. Their routing strategies can 
be classified into two classes according to if the packets are 
delivered along the shortest path or not. 

The delivering time of a packet from its born to its destina- 
tion depends on the status of internet and the routing strategy. 
It is believed that there are two phases in communication, i.e., 
the free and congestion phase. In the routing strategy of the 
shortest path, the delivering time equals the path length in the 
free phase and become longer and longer in the congestion 
phase with time going. For the former, the delivering times 
for different packets will be uncorrected as each individual 
packet can go freely to its destination; while for the later, they 
will become correlated as the waiting times are determined by 
the accumulated packets in their paths. In the routing strategy 
of non-shortest path, the delivering times may be different for 
different strategies even in the free phase. As the internet has a 
power-law degree distribution, the nodes with heavy links are 
easy to be the middle stations for packets to pass by and hence 
are easy to be congested. For reducing congestion on these 
heavy nodes, the packet may be delivered along a path which 
avoids the heavy nodes and hence the path is a little longer 



than the shortest path f^, Hj, li, 12]- Of course, the packets 
will still go the shortest path if the packets in the network is 
not accumulated. Therefore, it is possible for the delivering 
times to be either correlated or uncorrected in the free phase. 
As the delivering times are closely related to the degree of ac- 
cumulation of packets in the networks, the correlation of de- 
livering times can be also reflected in the time series of packets 
of the network. A typical routing strategy of the shortest path 
is given by Liu et al. [9]. And two typical routing strate gies 
of the non-shortest path are given by Echenique et al. lIlolTllll 
and Zhang et aJ. Ill 211 . Here we will study the correlation of 
packets produced by these three typical strategies. 

As the traffic data are produced by all the nodes with some 
randomness, there exist erratic fluctuation, heterogeneity, and 
nonstationarity in the data. These features make the corre- 
lation difficult to be quantified. A conventional approach to 
measure the correlation in this situation is by the detrended 
fluctuation analysis (DFA), which can reliably quantify scal- 
ing features in the fluctuations by filtering out polynomial 
trends. The DFA method is based on the idea that a correlated 
time series can be mapp ed to a self-similar process by integra- 
tion [SHE! OS 1711 . Therefore, measuring the self-similar 
feature can indirectly tell us information about the correlation 
properties. The DFA method has been successfully applied 
to detect long-range correlations in highly complex heart beat 
time series lfl4ll . stock index lfl5ll . and other physiological sig- 
nals 111 711 . In this paper, we will use the DFA method to mea- 
sure the correlation of traffic data. 

Most of the previous studies assume that the creation and 
delivering rates of packets do not change from node to node. 
Considering the fact that different nodes in the internet have 
different capacities, a more realistic assumption is that the 
packet creation and delivering rates at a node are degree- 
dependent. This feature has been recently addressed by Zhao 
et al. [6] and Liu et al. 12, [l2j]. They assume that the cre- 
ation and delivering rates of packets are Afc; and (1 + (3ki), 
respectively, where ki is the degree of node i, A represents the 
ability of creating packets for a node with degree one, the 1 
in (1 + (3ki) reflects the fact that a node can deliver at least 
one packet each time, and [3 denotes the ability for a link to 
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deliver packets. For a fixed A, there is a threshold C , It is 
in the free phase when j3 > (3 C and in the congestion phase 
when f3 < (3 C . Here we study how the scaling of correlation 
changes with the parameter (3. We find that there is always a 
scaling in the DFA of traffic data and the scaling can be di- 
vided into three regions, which implies the existence of three 
phases of traffic on complex networks, i.e., the free, buffer, 
and congestion phase. 

We now construct a scale-free network with the total num- 
ber of nodes N = 1000 and the average number of links 
connected with one node < k >= 6 according to the algo- 
rithm given in Ref. 11811 and let every node create Xki packets 
and send out at most 1 + /3fc,; packets at each time step. The 
destinations of the created packets are randomly chosen and 
the sending out obeys the first-in-first-out rule. In the deliv- 
ering process, the newly created and arrived packets will be 
placed at the end of the queues of each node. For the Liu's 
approach of the shortest path, we follow the Ref. |@] to collect 
the time series of traffic data. Figure Q] shows the evolution 
of the average packets per node < n(t) > in the network 
where the three lines from top to bottom represent the cases 
of /3 = 0.06 < p c ,P = 0.061 w p c , and P = 0.1 > p c , 
respectively. Obviously, the packets in the congestion phase 
of P = 0.06 increase linearly with time t, and the packets in 
the free phase of P — 0.061 and 0.1 fluctuate around different 
constants. For the Echenique's approach of the non-shortest 
path, we follow the Ref. | Tol U ] that a packet of node i will 
choose one of its neighboring nodes, I, as its next station ac- 
cording to the minimum value of S( — hdij + (1 — h)ng, 
where dij is the shortest path length from the neighboring 
node I to the destination j and ni is the accumulated pack- 
ets at node I. The parameter h is a weighing factor, which 
can be taken as a variational parameter and h rj 0.8 is found 
to give the best performance. The Echenique's approach thus 
accounts for the waiting time only at the neighboring nodes. 
Echenique's approach was presented for the case of equal cre- 
ation and delivering rates at every node. For the delivery 
rate of (1 + /3/c;), a modified Echenique's approach HTm is 
to choose a neighboring node with the minimum value of 
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We here choose h = 0.85 in Eq.(Q~|i and find that the traffic 
data has the similar behaviors for different p as that shown 
in Fig. Q] And for the Zhang's approach of the non-shortest 
path, we follow the Ref. IU2I1 to collect the traffic data. For 
a packet at node i, we take a node I from the neighbors of 
node i and label the shortest path from node £ to the source 
j by {SP : £,j}- Along this path, we evaluate the following 
quantity for the node £: 



d(l) 



E 



S £{SP:l,j} 



1 + Pks 



(2) 



where the sum is over the nodes along the shortest path {SP : 
£,j}, excluding the destination. Thus, d(£) is an estimate of 




8000 



FIG. 1: The average packets per node for A = 0.01 where the three 
lines from top to bottom represent the case of (3 — 0.06, 0.061, and 
0.1, respectively. 



the time that a packet would take to go from node I to the 
destination j through the shortest path. The node £ with the 
minimum of d(£) will be chosen as the next station of the 
packet at node i. We find that the traffic data also has the 
similar behaviors for different P as that shown in Fig. Q] 

All the three typical approaches show a common feature 
that there are two kinds of data: the data in the congestion 
phase increases linearly with t and the data in the free phase 
fluctuations around a constant. In order to quantify the cor- 
relations in the congestion phase, it is important to remove 
the global trend. Therefore, we remove the trend of linearly 
increasing with t by subtracting a best fitting straight line 
of the time series. This procedure makes the data in con- 
gestion phase have the similar behavior with that in the free 
phase. Figure [2] shows an example of removing the global 
trend where the upper line denotes the original data with 
P = 0.06 > P c and the lower line the data after removing 
the global trend. 

The DFA method is a modified root-mean-square (rms) 
analysis of a random walk and its algorithm can be worked 
out as the following steps ifTlHfl^lmflTll: 
(1) Start with a signal s(j), where j = 1, • • • , N, and N is the 
length of the signal, and integrate s(j) to obtain 
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[s(j)~ <s>], 



(3) 



where < s >= ^ J2 j= i S U)- 

(2) Divide the integrated profile y{i) into boxes of equal 
length m. In each box, we fit y(i) to get its local trend yfa by 
using a least-square fit. 

(3) The integrated profile y(i) is detrended by subtracting the 
local trend yfu in each box: 

Y m (i) = y(i)-y fit (i). (4) 

(4) For a given box size m, the rms fluctuation for the inte- 
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FIG. 2: Removing the global trend of the congestion data for the 
shortest path approach where the upper line denotes the original data 
with p = 0.06 > f3 c and the lower line the data after removing the 
global trend. 



grated and detrended signal is calculated: 



F{m) = 



\ 
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(5) 



(5) Repeat this procedure for different box size m. 
For scale-invariant signals with power-law correlations, there 
is a power-law relationship between the rms fluctuation func- 
tion F(m) and the box size m: 



F(m) ~ m c 



(6) 



The scaling a represents the degree of the correlation in the 
signal: the signal is uncorrelated for a = 0.5 and correlated 
for a > 0.5 J30d[lil[rzl]. 

We now use the DFA method to quantify the correlation and 
scaling properties of the fluctuated data with no global trend. 
For the collected data in the three typical approaches, Fig. [3] 
shows how the rms fluctuation function F(m) changes with 
the scale t where the lines from top to bottom in each panel 
denote the direction of increasing (3 and (a) represents the case 
of Liu's approach, (b) the case of Echenique's approach, and 
(c) the case of Zhang's approach. It is easy to see that all the 
lines are straight when m is smaller than the crossover point 
(shown by the arrows), indicating there is a scaling a for each 
line. Comparing the the lines with different /?, we see that the 
scaling a changes with (3. The relationship between a and (3 is 
shown in Fig. @]where the lines with "squares", "circles", and 
"stars" denote the cases of Liu's, Echenique's, and Zhang's 
approach, respectively. 

From Fig. [4] it is easy to see that a is an approximate con- 
stant in the congestion phase (/? < j3 c ) for all the three cases 
where (3 C are the locations of the three dashed lines, but a 
have different behaviors in the free phase between the method 
with the shortest path and that with the non-shortest path. In 
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FIG. 3: F(m) versus m for different /3 where the arrows show the 
crossover points, the dashed lines show the slopes/scalings of F(m) 
for eye guide, and (a) represents the case of Liu's approach, (b) the 
case of Echenique's approach, and (c) the case of Zhang's approach. 
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FIG. 4: How the scaling a changes with (3 where the lines 
with "squares", "circles", and "stars" denote the cases of Liu's, 
Echenique's, and Zhang's approach, respectively, and the three dot- 
ted lines show the locations of f3 c for the three approaches. 



the Liu's approach, a in the free phase is a constant, while in 
the Echenique's and Zhang's approaches, a is a constant for 
(3 > 0.061 but monotonously increases before (3 decreases to 
(3 C - Let's call the separation value of /3 from a constant to in- 
creasing as (3i, namely (3\ — 0.061. Then, we have (3\ = (3 C 
in Liu's approach and (3\ > (3 C in both the Echenique's and 
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Zhang's approaches. For (3 > 0i, all the values of a are 
close to 0.5, hence the corresponding traffic data are approxi- 
mately uncorrelated. For (3 C < (3 < (3\, the correlation in the 
Echenique's and Zhang's approaches are gradually increased 
from short range to long range correlation, and arrive global 
correlation for /3 < f3 c . For distinguishing the three different 
behaviors, we call their corresponding traffic as free ((3 > Pi), 
buffer ({3 C < (3 < (3\), and congestion (J3 < f3 c ) phase. 

The relationship between the scaling a and the correspond- 
ing traffic behaviors can be understood as follows. In the strat- 
egy of the shortest path, as the accumulation of packets will 
firstly occur at the hub nodes, the network can be considered 
as congested once the hub nodes are congested [9]. In that 
time, there are no accumulated packets at the non-hub nodes. 
Hence, the critical value j3 c can be figured out by the average 
packets on the hub equalling 1 + f3 c khub- Once the conges- 
tion occurs {J3 < (3 C ), the accumulation at the hub nodes will 
increase linearly and hence influence all the packets that take 
the hub nodes as the middle stations. As most of the shortest 
paths cross the hub nodes in scale-free networks, the influ- 
ence of their congestion will be global and make a significant 
change in correlation when the traffic changes from free to 
congestion phase. This is why we observe the jump of a in 
the line with "squares" in Fig. @] 

While in the strategy of the non-shortest path, the accumu- 
lated packets at the hub nodes for (3 C < (3 < f3\ will not 
increase linearly with time even when its average is slightly 
larger than 1 + (3kh u b because the coming packets will choose 
other paths to reduce the delivering time. Therefore, for a 
fixed creation rate, the packets will go longer and longer paths 
to avoid the congestion when the delivering parameter j3 de- 
creases. With the further decrease of (3, more and more nodes 
have their average packets larger than 1 + f3ki, i.e., there are 
accumulated packets at these nodes. When the nodes with the 
smallest links begin to be accumulated with packets, the con- 
gestion occurs. Therefore, the correlation among the packets 
will become stronger and stronger with the decrease of deliv- 
ering rate (3. That is why we observe the gradually increase of 
a in the lines with "squares" and "circles" in Fig. [4] On the 
other hand, a packet will choose its path from all the nodes 
when (3 < j3 c , thus the correlation will become global. And 
for (3 > fix, the average packets at the hub nodes will not be 
over 1 + ftkhub, but the fluctuation of packets may be over 
1 + (3khub sometimes. Once it happens, the coming pack- 
ets will choose other paths to avoid the accumulation at the 
hub nodes, resulting a short range correlation even in the free 
phase. So we observe the "circles" and "stars" is a little higher 
than the "squares" for (3 > /?i in Fig. [4] In sum, a difference 
between the buffer phase and the free phase is that the average 
packets on a node will be over 1 + (3ki for the former but not 
for the later; and a difference between the buffer phase and the 



congestion phase is that the average packets on a node will in- 
crease linearly with time in the congestion phase but not in the 
buffer phase. 

In conclusions, we have investigated the correlation of traf- 
fic data for three typical routing strategies by the DFA method. 
We find that there are two phases in the strategy of the short- 
est path but three phases in the strategy of the non-shortest 
path. The buffer phase comes from the fact that the coming 
packets will go a little longer paths with small nodes to avoid 
the heavy accumulation at the hub nodes. The average pack- 
ets in the buffer phase is larger than that in the free phase but 
does not increase linearly with time. The finding of the buffer 
phase may shed light on the way of further studying the com- 
mulation in internet. 
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